Parallel and cascaded deep neural networks for text-to-speech synthesis
نویسندگان
چکیده
An investigation of cascaded and parallel deep neural networks for speech synthesis is conducted. In these systems, suprasegmental linguistic features (syllable-level and above) are processed separately from segmental features (phone-level and below). The suprasegmental component of the networks learns compact distributed representations of high-level linguistic units without any segmental influence. These representations are then integrated into a frame-level system using a cascaded or a parallel approach. In the cascaded network, suprasegmental representations are used as input to the framelevel network. In the parallel network, segmental and suprasegmental features are processed separately and concatenated at a later stage. These experiments are conducted with a standard set of high-dimensional linguistic features as well as a hand-pruned one. It is observed that hierarchical systems are consistently preferred over the baseline feedforward systems. Similarly, parallel networks are preferred over cascaded networks.
منابع مشابه
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks
Quality of text-to-speech voices built from noisy recordings is diminished. In order to improve it we propose the use of a recurrent neural network to enhance acoustic parameters prior to training. We trained a deep recurrent neural network using a parallel database of noisy and clean acoustics parameters as input and output of the network. The database consisted of multiple speakers and divers...
متن کاملTTS synthesis with bidirectional LSTM based recurrent neural networks
Feed-forward, Deep neural networks (DNN)-based text-tospeech (TTS) systems have been recently shown to outperform decision-tree clustered context-dependent HMM TTS systems [1, 4]. However, the long time span contextual effect in a speech utterance is still not easy to accommodate, due to the intrinsic, feed-forward nature in DNN-based modeling. Also, to synthesize a smooth speech trajectory, th...
متن کاملKinematic Synthesis of Parallel Manipulator via Neural Network Approach
In this research, Artificial Neural Networks (ANNs) have been used as a powerful tool to solve the inverse kinematic equations of a parallel robot. For this purpose, we have developed the kinematic equations of a Tricept parallel kinematic mechanism with two rotational and one translational degrees of freedom (DoF). Using the analytical method, the inverse kinematic equations are solved for spe...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملGlobal Syllable Vectors for Building TTS Front-End with Deep Learning
Recent vector space representations of words have succeeded in capturing syntactic and semantic regularities. In the context of text-to-speech (TTS) synthesis, a front-end is a key component for extracting multi-level linguistic features from text, where syllable acts as a link between lowand high-level features. This paper describes the use of global syllable vectors as features to build a fro...
متن کامل